Python for Senior Lesson 6

  • v1.0.0 2016.11 by David.Yi
  • v1.1 2020.5 edit by David Yi

本次内容要点

  • for break continue else:完整的 for 语句结构
  • pprint 高级打印模块使用
  • python 标准库:网站访问 urllib 库介绍

完整的 for 语句结构

for continue for break for else

continue 语句跳出本次循环,而 break 跳出整个循环; continue 语句用来告诉 Python 跳过当前循环的剩余语句,然后继续进行下一轮循环; continue 语句可以用在 while 和 for 循环中;

break 语句用来终止循环语句。

else 语句会在循环正常执行完(即 for 不是通过 break 跳出而中断的)的情况下执行。


In [1]:
# for continue 举例

for l in 'Python':
    if l == 'h':
        continue
    print('letter:', l)


letter: P
letter: y
letter: t
letter: o
letter: n

In [41]:
# for continue 举例
# 去除输入的内容中的元音字母

list1 = input('Input:')

list2 = []

for s in list1:
    if s in ['a','e','i','o','u']:
        continue
    list2.append(s)

# 将列表转换为字符串
s2 = ''.join(list2)
print('Output:', s2)


Input:hello
Output: hll

In [2]:
# for break else 举例
# 判断质数

import time

a = time.time()

for num in range(3,20):
    for i in range(2,num): 
        if num%i == 0:      
            j=num/i          
            print(num,i,'*',j)
            break            
    else:                  
        print(num, '是质数')
        
b = time.time()
print(b-a)


3 是质数
4 2 * 2.0
5 是质数
6 2 * 3.0
7 是质数
8 2 * 4.0
9 3 * 3.0
10 2 * 5.0
11 是质数
12 2 * 6.0
13 是质数
14 2 * 7.0
15 3 * 5.0
16 2 * 8.0
17 是质数
18 2 * 9.0
19 是质数
0.0005238056182861328

pprint() 介绍

pprint module提供了可以按照某个格式正确的显示 Python 已知类型数据的一种方法,这种格式很易读。


In [2]:
import sys
print(sys.path)


['', '/Users/yijun/anaconda/lib/python34.zip', '/Users/yijun/anaconda/lib/python3.4', '/Users/yijun/anaconda/lib/python3.4/plat-darwin', '/Users/yijun/anaconda/lib/python3.4/lib-dynload', '/Users/yijun/anaconda/lib/python3.4/site-packages/Sphinx-1.3.5-py3.4.egg', '/Users/yijun/anaconda/lib/python3.4/site-packages/setuptools-20.3-py3.4.egg', '/Users/yijun/anaconda/lib/python3.4/site-packages', '/Users/yijun/anaconda/lib/python3.4/site-packages/aeosa', '/Users/yijun/anaconda/lib/python3.4/site-packages/IPython/extensions', '/Users/yijun/.ipython']

In [4]:
import sys
import pprint
pprint.pprint(sys.path)


['',
 '/Users/yijun/anaconda/lib/python34.zip',
 '/Users/yijun/anaconda/lib/python3.4',
 '/Users/yijun/anaconda/lib/python3.4/plat-darwin',
 '/Users/yijun/anaconda/lib/python3.4/lib-dynload',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/Sphinx-1.3.5-py3.4.egg',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/setuptools-20.3-py3.4.egg',
 '/Users/yijun/anaconda/lib/python3.4/site-packages',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/aeosa',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/IPython/extensions',
 '/Users/yijun/.ipython']

In [5]:
import sys
sys.path


Out[5]:
['',
 '/Users/yijun/anaconda/lib/python34.zip',
 '/Users/yijun/anaconda/lib/python3.4',
 '/Users/yijun/anaconda/lib/python3.4/plat-darwin',
 '/Users/yijun/anaconda/lib/python3.4/lib-dynload',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/Sphinx-1.3.5-py3.4.egg',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/setuptools-20.3-py3.4.egg',
 '/Users/yijun/anaconda/lib/python3.4/site-packages',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/aeosa',
 '/Users/yijun/anaconda/lib/python3.4/site-packages/IPython/extensions',
 '/Users/yijun/.ipython']

In [48]:
import sys

print ('[')

for i in sys.path:
    print("'",i,"'",",")


[
'  ' ,
' /Users/yijun/anaconda/lib/python35.zip ' ,
' /Users/yijun/anaconda/lib/python3.5 ' ,
' /Users/yijun/anaconda/lib/python3.5/plat-darwin ' ,
' /Users/yijun/anaconda/lib/python3.5/lib-dynload ' ,
' /Users/yijun/anaconda/lib/python3.5/site-packages/Sphinx-1.3.5-py3.5.egg ' ,
' /Users/yijun/anaconda/lib/python3.5/site-packages/setuptools-20.3-py3.5.egg ' ,
' /Users/yijun/anaconda/lib/python3.5/site-packages ' ,
' /Users/yijun/anaconda/lib/python3.5/site-packages/aeosa ' ,
' /Users/yijun/anaconda/lib/python3.5/site-packages/IPython/extensions ' ,
' /Users/yijun/.ipython ' ,

In [49]:
import sys
print(type(sys.path))


<class 'list'>

urllib 包

urllib 提供了一系列用于操作 URL 的功能。

urllib 的 request 模块可以非常方便地抓取 URL 内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应。


In [3]:
# 用 urllib 来读取网站源代码

import urllib.request

response = urllib.request.urlopen('http://sogou.com')
html = response.read()
print(html)


<class 'bytes'>
b'<!DOCTYPE html>\r\n<html lang="cn">\r\n<head>\r\n    <script>\r\n    window._speedMark = new Date();\r\n        window.lead_ip = \'58.38.242.68\';\r\n</script>    <meta charset="utf-8">\r\n<link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dl.web.sogoucdn.com">\r\n<title>\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e - \xe4\xb8\x8a\xe7\xbd\x91\xe4\xbb\x8e\xe6\x90\x9c\xe7\x8b\x97\xe5\xbc\x80\xe5\xa7\x8b</title>\r\n<link rel="shortcut icon" href="//www.sogou.com/images/logo/new/favicon.ico?v=2" type="image/x-icon">\r\n<meta http-equiv="X-UA-Compatible" content="IE=Edge">\r\n<meta name="keywords" content="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2,\xe7\xbd\x91\xe9\xa1\xb5\xe6\x90\x9c\xe7\xb4\xa2,\xe5\xbe\xae\xe4\xbf\xa1\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xa7\x86\xe9\xa2\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe5\x9b\xbe\xe7\x89\x87\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x9f\xb3\xe4\xb9\x90\xe6\x90\x9c\xe7\xb4\xa2,\xe6\x96\xb0\xe9\x97\xbb\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xbd\xaf\xe4\xbb\xb6\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x97\xae\xe7\xad\x94\xe6\x90\x9c\xe7\xb4\xa2,\xe7\x99\xbe\xe7\xa7\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xb4\xad\xe7\x89\xa9\xe6\x90\x9c\xe7\xb4\xa2">\r\n<meta name="description" content="\xe4\xb8\xad\xe5\x9b\xbd\xe6\x9c\x80\xe9\xa2\x86\xe5\x85\x88\xe7\x9a\x84\xe4\xb8\xad\xe6\x96\x87\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e\xef\xbc\x8c\xe6\x94\xaf\xe6\x8c\x81\xe5\xbe\xae\xe4\xbf\xa1\xe5\x85\xac\xe4\xbc\x97\xe5\x8f\xb7\xe3\x80\x81\xe6\x96\x87\xe7\xab\xa0\xe6\x90\x9c\xe7\xb4\xa2\xef\xbc\x8c\xe9\x80\x9a\xe8\xbf\x87\xe7\x8b\xac\xe6\x9c\x89\xe7\x9a\x84SogouRank\xe6\x8a\x80\xe6\x9c\xaf\xe5\x8f\x8a\xe4\xba\xba\xe5\xb7\xa5\xe6\x99\xba\xe8\x83\xbd\xe7\xae\x97\xe6\xb3\x95\xe4\xb8\xba\xe6\x82\xa8\xe6\x8f\x90\xe4\xbe\x9b\xe6\x9c\x80\xe5\xbf\xab\xe3\x80\x81\xe6\x9c\x80\xe5\x87\x86\xe3\x80\x81\xe6\x9c\x80\xe5\x85\xa8\xe7\x9a\x84\xe6\x90\x9c\xe7\xb4\xa2\xe6\x9c\x8d\xe5\x8a\xa1\xe3\x80\x82">    <link rel="stylesheet" type="text/css" href="/index/css/base.css?v=20161027">\r\n<style>.wrapper .suggestion{width: 614px}.wrapper .nobg .suglist{width: 614px;_width: 611px;}.wrapper .suglist {width: 195px;}</style>\r\n</head>\r\n<body >\r\n        <div class="wrapper" id="wrap">\r\n        <div class="header">\r\n            <div class="top-nav">\r\n    <ul>\r\n        <li><a onclick="st(this,\'40030300\',\'news\')" href="http://news.sogou.com" uigs-id="nav_news" id="news">\xe6\x96\xb0\xe9\x97\xbb</a></li>\r\n        <li class="cur"><span>\xe7\xbd\x91\xe9\xa1\xb5</span></li>\r\n        <li><a onclick="st(this,\'73141200\',\'weixin\')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">\xe5\xbe\xae\xe4\xbf\xa1</a></li>\r\n        <li><a onclick="st(this,\'40051200\',\'zhihu\')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">\xe7\x9f\xa5\xe4\xb9\x8e</a></li>\r\n        <li><a onclick="st(this,\'40030500\',\'pic\')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">\xe5\x9b\xbe\xe7\x89\x87</a></li>\r\n        <li><a onclick="st(this,\'40030600\',\'video\')" href="http://v.sogou.com/" uigs-id="nav_v" id="video">\xe8\xa7\x86\xe9\xa2\x91</a></li>\r\n        <li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,\'\',\'myingyi\')">\xe6\x98\x8e\xe5\x8c\xbb</a></li>\r\n        <li><a href="http://english.sogou.com?fr=common_index_nav" uigs-id="nav_english" id="english" onclick="st(this,\'\',\'english\')">\xe8\x8b\xb1\xe6\x96\x87</a></li>\r\n        <li><a href="http://scholar.sogou.com?fr=common_index_nav" uigs-id="nav_scholar" id="scholar" onclick="st(this,\'\',\'scholar\')">\xe5\xad\xa6\xe6\x9c\xaf</a></li>\r\n        <li class="show-more">\r\n            <a href="javascript:void(0);" id="more-product">\xe6\x9b\xb4\xe5\xa4\x9a</a>\r\n            <div class="pos-more" id="products-box">\r\n                <span class="ico-san"></span>\r\n                <a onclick="st(this,\'40031000\')" href="http://map.sogou.com" uigs-id="nav_map" id="map">\xe5\x9c\xb0\xe5\x9b\xbe</a>\r\n                <a onclick="st(this,\'web2ww\')" href="http://wenwen.sogou.com/" uigs-id="nav_wenwen" id="index_more_wenwen">\xe9\x97\xae\xe9\x97\xae</a>\r\n                <a onclick="st(this,\'40031500\')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">\xe8\xb4\xad\xe7\x89\xa9</a>\r\n                <a onclick="st(this,\'40051203\')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_more_baike">\xe7\x99\xbe\xe7\xa7\x91</a>\r\n                <a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">\xe7\x9f\xa5\xe8\xaf\x86</a>\r\n                <a onclick="st(this,\'40050200\')" href="http://mp3.sogou.com/" uigs-id="nav_music" id="index_more_music">\xe9\x9f\xb3\xe4\xb9\x90</a>\r\n                <a onclick="st(this,\'40051205\')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">\xe5\xba\x94\xe7\x94\xa8</a>\r\n                <span class="all"><a onclick="st(this,\'40051206\')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">\xe5\x85\xa8\xe9\x83\xa8</a></span>\r\n            </div>\r\n        </li>\r\n    </ul>\r\n</div>            <div class="user-box">\r\n    <div class="local-weather" id="local-weather">\r\n        <div class="wea-box" id="cur-weather" style="display: none;"></div>\r\n        <div class="pos-more" id="detail-weather"></div>\r\n    </div>\r\n    <span class="line" id="user-box-line" style="display: none;"></span>\r\n    <div class="user-enter">\r\n        <a href="javascript:void(0);" id="show-card" style="display: none" uigs-id="settings_show-card">\xe6\x98\xbe\xe7\xa4\xba\xe5\x8d\xa1\xe7\x89\x87</a>\r\n                    <a href="javascript:void(0);" uigs-id="settings_change-skin" id="changeSkinBtn">\xe6\x8d\xa2\xe8\x82\xa4</a>\r\n                            <a href="javascript:void(0);" class="enter" id="loginBtn">\xe7\x99\xbb\xe5\xbd\x95</a>            <a href="javascript:void(0);" class="settings" id="settings"></a>\r\n                <div class="pos-more" id="settings-box">\r\n            <span class="ico-san"></span>\r\n                        <a href="/advanced/config.html" uigs-id="settings_config">\xe4\xb8\xaa\xe6\x80\xa7\xe8\xae\xbe\xe7\xbd\xae</a>\r\n                        <a href="/advanced/advanced.html?w=01090100" uigs-id="settings_advanced">\xe9\xab\x98\xe7\xba\xa7\xe6\x90\x9c\xe7\xb4\xa2</a>\r\n            <a href="http://help.sogou.com/?w=01091500&v=1" uigs-id="settings_help">\xe5\xb8\xae\xe5\x8a\xa9</a>\r\n                    </div>\r\n    </div>\r\n</div>        </div>\r\n        <div class="content" id="content">\r\n            <div class="pos-header" id="top-float-bar">\r\n    <div class="part-one"></div>\r\n    <div class="part-two" id="card-tab-layer">\r\n        <div class="c-top" id="top-card-tab"></div>\r\n    </div>\r\n</div>\r\n<div class="logo2" id="logo-s"><span></span></div>            <div class="logo" id="logo-l"><span></span></div>            <div class="search-box" id="search-box">\r\n    <form action="/web" name="sf" id="sf" onsubmit="if(this.query.value==\'\')return false;document.sf._ast.value=Math.round(new Date().getTime()/1000);">\r\n                <span class="sec-input-box">\r\n                    <input type="text" class="sec-input active" name="query" id="query" maxlength="100" autocomplete="off" />\r\n                </span>\r\n        <span class="enter-input"><input type="submit" value="" id="stb"></span>\r\n        <input type="hidden" name="_asf" value="www.sogou.com" />\r\n        <input type="hidden" name="_ast" />\r\n        <input type="hidden" name="w" value="01019900" />\r\n        <input type="hidden" name="p" value="40040100" />\r\n        <input type="hidden" name="ie" value="utf8" />\r\n                <input type="hidden" name="from" value="index-nologin" />\r\n            </form>\r\n</div>        </div>\r\n            <div class="card-box" id="card-box" style="display: none;">\r\n    <div class="card-box2" id="card-box2">\r\n        <div class="c-top" id="card-tab-box">\r\n            <a href="javascript:void(0);" id="card-settings" uigs-id="settings_settings-btn" class="shezhi"></a>\r\n            <div class="pos-more" id="card-options">\r\n                <span class="ico-san"></span>\r\n                <a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card">\xe5\x85\xb3\xe9\x97\xad\xe5\x8d\xa1\xe7\x89\x87</a>\r\n            </div>\r\n        </div>\r\n        <div class="c-main" id="card-content"></div>\r\n    </div>\r\n</div>\r\n<div class="loog-more" id="scroll-more" style="display: none;">\r\n    <a href="javascript:void(0);" uigs-id="scroll-more">\xe6\xbb\x9a\xe5\x8a\xa8\xe6\x9f\xa5\xe7\x9c\x8b\xe6\x9b\xb4\xe5\xa4\x9a<br><span class="ico_san"></span></a>\r\n</div>            <div class="ft" id="footer">\r\n    <a href="http://fuwu.sogou.com/" target="_blank" uigs-id="footer_tuiguang">\xe4\xbc\x81\xe4\xb8\x9a\xe6\x8e\xa8\xe5\xb9\xbf</a><span class="line"></span><a href="http://corp.sogou.com/" target="_blank" uigs-id="footer_about">\xe5\x85\xb3\xe4\xba\x8e\xe6\x90\x9c\xe7\x8b\x97</a><span class="line"></span><a href="/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">\xe5\x85\x8d\xe8\xb4\xa3\xe5\xa3\xb0\xe6\x98\x8e</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank"  uigs-id="footer_feedback">\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88</a><br>&copy;&nbsp;<span>2016</span>&nbsp;SOGOU&nbsp;-&nbsp;<a href="http://www.miibeian.gov.cn" target="_blank" class="g">\xe4\xba\xacICP\xe8\xaf\x81050897\xe5\x8f\xb7</a>&nbsp;-&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">\xe4\xba\xac\xe5\x85\xac\xe7\xbd\x91\xe5\xae\x89\xe5\xa4\x8711000002000025\xe5\x8f\xb7</a>\r\n</div>            <div class="kuozhan" id="QRcode-box" >\r\n    <a href="javascript:void(0);" id="miniQRcode"></a>\r\n    <span id="QRcode"></span>\r\n</div>\r\n<a href="javascript:void(0);" class="back-top" id="back-top"></a>    </div>\r\n        <script>\r\n    var SugPara, uigs_para,\r\n        msBrowserName = navigator.userAgent.toLowerCase(),\r\n        msIsSe = false,\r\n        msIsMSearch = false,\r\n        queryinput = document.getElementById(\'query\');\r\n\r\n    uigs_para={\r\n        "uigs_productid": "webapp",\r\n        "type": "webindex_new",\r\n    "stype": "nologin",\r\n        "scrnwi": screen.width,\r\n        "scrnhi": screen.height,\r\n        "uigs_pbtag": "A",\r\n        "uigs_cookie": "SUID,sct",\r\n        "puid": "invaliduser",\r\n        "cards": "",\r\n        "cards_sw": "$cardEnabled",\r\n        "skin": "$skinId",\r\n        "skin_sw": "$skinEnable",\r\n        "protocol": location.protocol.toLowerCase() == "https:" ? "https" : "http"\r\n    };\r\n\r\n    SugPara = {"enableSug":true,"sugType":"web","domain":"w.sugg.sogou.com","productId":"web","sugFormName":"sf","inputid":"query","submitId":"stb","suggestRid":"01015002","normalRid":"01019900","useParent":0 ,"sugglocation":"index"};\r\n\r\n        \r\n    function mk_con() {\r\n        try {\r\n            window.external.metasearch(\'make_connection\', \'www.google.com.hk\');\r\n        } catch (e) {}\r\n    }\r\n\r\n    if (/se 2\\.x/i.test(msBrowserName)) {\r\n        msIsSe = true;\r\n    }\r\n\r\n    if (/metasr/i.test(msBrowserName)) {\r\n        msIsMSearch = true;\r\n    }\r\n\r\n    if (queryinput) {\r\n        if (msIsSe && msIsMSearch) {\r\n            if (queryinput.addEventListener) {\r\n                queryinput.addEventListener(\'keypress\', mk_con, false);\r\n                queryinput.addEventListener(\'keydown\', mk_con, false)\r\n            } else if (queryinput.attachEvent) {\r\n                queryinput.attachEvent(\'onkeypress\', mk_con);\r\n                queryinput.attachEvent(\'onkeydown\', mk_con);\r\n            } else {\r\n                queryinput.onkeypress = mk_con;\r\n                queryinput.onkeydown = mk_con;\r\n            }\r\n        }\r\n    }\r\n\r\n    window.m_s_index = function() {\r\n        var w = document.sf.query,\r\n                c = Math.round((new Date().getTime() + Math.random()) * 1000);\r\n\r\n        w.focus();\r\n\r\n        if(new RegExp("kw=([^&]+)").test(location.search)) {\r\n            if(w.value.length == 0) {\r\n                w.value = decodeURIComponent(RegExp.$1);\r\n            }\r\n        }\r\n\r\n        if (document.cookie.indexOf("SUV=") < 0) {\r\n            document.cookie = "SUV=" + c + ";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain=sogou.com"\r\n        }\r\n    };\r\n\r\n    function st(self, p, product, anchor) {\r\n        var searchBox = document.sf.query,\r\n            query = encodeURIComponent(searchBox.value),\r\n\r\n            productUrl = {\r\n                "news": \'http://news.sogou.com/news?query=\',\r\n                "web": \'web?query=\',\r\n                "weixin": \'http://weixin.sogou.com/weixin?type=2&query=\',\r\n                "zhihu": \'http://zhihu.sogou.com/zhihu?query=\',\r\n                "pic": \'http://pic.sogou.com/pics?query=\',\r\n                "video": \'http://v.sogou.com/v?query=\',\r\n                "myingyi": \'http://mingyi.sogou.com/mingyi?fr=common_index_nav&query=\',\r\n                "english": \'http://english.sogou.com?fr=common_index_nav&query=\',\r\n                "scholar": \'http://scholar.sogou.com?fr=common_index_nav&query=\'\r\n            },\r\n            newHref = productUrl[product] || self.href;\r\n\r\n        function getConnectSymbol(url) {\r\n            return url.indexOf("?") > -1 ? \'&\' : \'?\';\r\n        }\r\n\r\n        if(searchBox && searchBox.value !== \'\'){\r\n\r\n            if(productUrl[product]) {\r\n                newHref = productUrl[product] + query;\r\n            } else if(newHref.indexOf("kw=") > 0) {\r\n                newHref = newHref.replace(new RegExp("kw=[^&$]*"), "kw=" + query)\r\n            } else {\r\n                newHref += getConnectSymbol(newHref) + \'kw=\' + query;\r\n            }\r\n        }\r\n\r\n        if(p){\r\n            newHref += getConnectSymbol(newHref) + "p=" + p;\r\n        }\r\n\r\n        if (anchor && anchor.length > 0){\r\n            newHref += "#" + anchor;\r\n        }\r\n\r\n        self.href = newHref;\r\n    }\r\n\r\n    window.cid = function(o, p) {\r\n        var w = document.sf.query,\r\n            q = encodeURIComponent(w.value);\r\n\r\n        if (!q) {\r\n            o.href += "?cid=" + p\r\n        } else {\r\n            if (p === "web2ww") {\r\n                o.href += "s/?cid=web2ww&w=" + q\r\n            } else if (p === "web2bk") {\r\n                o.href += "Search.e?sp=S" + q + "&cid=web2bk"\r\n            }\r\n        }\r\n    };\r\n\r\n    window.m_s_index();\r\n</script>\r\n<script charset="gbk" type="text/javascript" src="/js/sugg_new.v.85.js"></script>\r\n<script src="/js/pb_v.1.9.4.min.js"></script>\r\n<script src="//dl.web.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>\r\n<script src="/js/lib/jquery.mousewheel.min.js"></script>\r\n<script src="/js/lib/juicer-min.js"></script>\r\n<script src="/js/common/widget/login_new.min.v.0.2.js"></script>\r\n<script src="//account.sogou.com/static/api/passport-async.js"></script>\r\n<script src="/index/js/base.js?v=20161026"></script>\r\n<script src="/web/js/taspeed.min.v.0.0.1.js"></script></body>\r\n</html>\r\n<!--zly-->'

In [11]:
# 使用 readlines() 方法

import urllib.request

response = urllib.request.urlopen('http://www.baidu.com')
html = response.readlines()

for i, item in enumerate(html):
    if i>100 and i<150:
        print(item)


b'\r\n'
b'\t\r\n'
b'        \r\n'
b'\t\t\t        \r\n'
b'\t\r\n'
b'\t\t\t        \r\n'
b'\t\r\n'
b'\t\t\t        \r\n'
b'\t\r\n'
b'\t\t\t        \r\n'
b'\t\t\t    \r\n'
b'\r\n'
b'\r\n'
b'\r\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
b'\r\n'
b'\n'
b'<html>\n'
b'<head>\n'
b'    \n'
b'    <meta http-equiv="content-type" content="text/html;charset=utf-8">\n'
b'    <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n'
b'\t<meta content="always" name="referrer">\n'
b'    <meta name="theme-color" content="#2932e1">\n'
b'    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />\n'
b'    <link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /> \n'
b'    <link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg">\n'
b'\t\n'
b'\t\n'
b'\t<link rel="dns-prefetch" href="//s1.bdstatic.com"/>\n'
b'\t<link rel="dns-prefetch" href="//t1.baidu.com"/>\n'

In [8]:
# 用 urllib 来读取网站信息

import urllib.request

response = urllib.request.urlopen('http://www.baidu.com')
info = response.info()
print(info)


Date: Tue, 01 Nov 2016 14:24:17 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: Close
Vary: Accept-Encoding
Set-Cookie: BAIDUID=C9FC84CDD388490492FE90B3537341E7:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=C9FC84CDD388490492FE90B3537341E7; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1478010257; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: BD_HOME=0; path=/
Set-Cookie: H_PS_PSSID=1457_21098_18560_21454_21394_21378_21189; path=/; domain=.baidu.com
P3P: CP=" OTI DSP COR IVA OUR IND COM "
Cache-Control: private
Cxy_all: baidu+85949e388d707f7b8522d61e00c1115e
Expires: Tue, 01 Nov 2016 14:24:03 GMT
X-Powered-By: HPHP
Server: BWS/1.1
X-UA-Compatible: IE=Edge,chrome=1
BDPAGETYPE: 1
BDQID: 0x8674d38a000311f7
BDUSERID: 0



requests 包

requests 包是 python 目前最好用的网站内容访问包,设计上比较人性化,可以简化代码


In [5]:
import requests

r = requests.get('http://www.baidu.com')
print(r.content)


b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>\xe6\x96\xb0\xe9\x97\xbb</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>\xe5\x9c\xb0\xe5\x9b\xbe</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>\xe8\xa7\x86\xe9\xa2\x91</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>\xe8\xb4\xb4\xe5\x90\xa7</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>\xe7\x99\xbb\xe5\xbd\x95</a> </noscript> <script>document.write(\'<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ \'" name="tj_login" class="lb">\xe7\x99\xbb\xe5\xbd\x95</a>\');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">\xe6\x9b\xb4\xe5\xa4\x9a\xe4\xba\xa7\xe5\x93\x81</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>\xe5\x85\xb3\xe4\xba\x8e\xe7\x99\xbe\xe5\xba\xa6</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2016&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>\xe4\xbd\xbf\xe7\x94\xa8\xe7\x99\xbe\xe5\xba\xa6\xe5\x89\x8d\xe5\xbf\x85\xe8\xaf\xbb</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88</a>&nbsp;\xe4\xba\xacICP\xe8\xaf\x81030173\xe5\x8f\xb7&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'

In [10]:
print(r.headers)


{'Transfer-Encoding': 'chunked', 'Content-Encoding': 'gzip', 'Pragma': 'no-cache', 'Connection': 'Keep-Alive', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Server': 'bfe/1.0.8.18', 'Date': 'Fri, 04 Nov 2016 15:05:30 GMT', 'Content-Type': 'text/html', 'Last-Modified': 'Mon, 25 Jul 2016 11:12:41 GMT', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform'}

下载文件

使用 urllib 和 requets 都可以很方便的获得网站中的图片、文件等。下面只是简单的举例,下载 baidu 的 logo 文件。


In [50]:
# 下载文件 使用 urllib
# baidu 的 logo 文件: http://home.baidu.com/resource/r/home/img/logo-yy.gif  

import urllib

# url = 'http://home.baidu.com/resource/r/home/img/logo-yy.gif'

# url = 'http://fushanedu.cn/images/LOGO_sy.jpg'

url ='http://www.jcsy.pudong-edu.sh.cn/News/images/photo_xz.jpg'

urllib.request.urlretrieve(url, "logo3.jpg")
print('download ok')


download ok

In [15]:
# 下载文件 使用 requests
# baidu 的 logo 文件: http://home.baidu.com/resource/r/home/img/logo-yy.gif  

import requests

url = 'http://home.baidu.com/resource/r/home/img/logo-yy.gif'

r = requests.get(url)
with open("logo.gif", "wb") as code:
    code.write(r.content)
    print('download ok')


download ok

In [ ]: